AITopics | dr 2

Collaborating Authors

dr 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

99e7e6ce097324aceb45f98299ceb621-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 06:05:49 GMT

altmingd, dr 2, probability, (15 more...)

Neural Information Processing Systems

Country:

Asia > India (0.04)
North America > United States > Illinois (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Underdamped Langevin MCMC with third order convergence

Scott, Maximilian, O'Kane, Dáire, Jelinčič, Andraž, Foster, James

arXiv.org Machine LearningAug-25-2025

In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of smoothness. Precisely, under the assumptions that the gradient and Hessian of $f$ are Lipschitz continuous, our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon)$ and $\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$ steps respectively. Therefore, our algorithm has a similar complexity as other popular Langevin MCMC algorithms under matching assumptions. However, if we additionally assume that the third derivative of $f$ is Lipschitz continuous, then our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$ steps. To the best of our knowledge, this is the first gradient-only method for ULD with third order convergence. To support our theory, we perform Bayesian logistic regression across a range of real-world datasets, where our algorithm achieves competitive performance compared to an existing underdamped Langevin MCMC algorithm and the popular No U-Turn Sampler (NUTS).

artificial intelligence, machine learning, quicsort method, (14 more...)

arXiv.org Machine Learning

2508.16485

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Somerset > Bath (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Statistically and Computationally Efficient Linear Meta-representation Learning

Neural Information Processing SystemsAug-16-2025, 07:08:01 GMT

In typical few-shot learning, each task is not equipped with enough data to be learned in isolation.

artificial intelligence, machine learning, probability, (16 more...)

Neural Information Processing Systems

Country:

Asia > India (0.04)
North America > United States > Illinois (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

Qin, Zhen, Zhu, Zhihui

arXiv.org Artificial IntelligenceJun-9-2024

Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD.

decomposition, regression, tensor, (13 more...)

arXiv.org Artificial Intelligence

2406.06002

Country:

North America > United States > Ohio (0.04)
Europe > Italy (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre:

Research Report (0.63)
Overview (0.45)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

Zhong, Huiying, Deng, Zhun, Su, Weijie J., Wu, Zhiwei Steven, Zhang, Linjun

arXiv.org Machine LearningMar-7-2024

Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. To overcome such limitations, we incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties. We focus on the offline learning setting and establish sample complexity bounds, along with efficiency and fairness guarantees, for optimizing diverse social welfare functions such as Nash, Utilitarian, and Leximin welfare functions. Our results show a separation between the sample complexities of multi-party RLHF and traditional single-party RLHF. Furthermore, we consider a reward-free setting, where each individual's preference is no longer consistent with a reward model, and give pessimistic variants of the von Neumann Winner based on offline preference data. Taken together, our work showcases the advantage of multi-party RLHF but also highlights its more demanding statistical complexity.

probability, social welfare function, welfare function, (15 more...)

arXiv.org Machine Learning

2403.05006

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Stochastic Localization via Iterative Posterior Sampling

Grenioux, Louis, Noble, Maxence, Gabrié, Marylou, Durmus, Alain Oliviero

arXiv.org Artificial IntelligenceFeb-16-2024

Building upon score-based learning, new interest in stochastic localization techniques has recently emerged. In these models, one seeks to noise a sample from the data distribution through a stochastic process, called observation process, and progressively learns a denoiser associated to this dynamics. Apart from specific applications, the use of stochastic localization for the problem of sampling from an unnormalized target density has not been explored extensively. This work contributes to fill this gap. We consider a general stochastic localization framework and introduce an explicit class of observation processes, associated with flexible denoising schedules. We provide a complete methodology, $\textit{Stochastic Localization via Iterative Posterior Sampling}$ (SLIPS), to obtain approximate samples of this dynamics, and as a by-product, samples from the target distribution. Our scheme is based on a Markov chain Monte Carlo estimation of the denoiser and comes with detailed practical guidelines. We illustrate the benefits and applicability of SLIPS on several benchmarks, including Gaussian mixtures in increasing dimensions, Bayesian logistic regression and a high-dimensional field system from statistical-mechanics.

algorithm, observation process, stochastic localization, (14 more...)

arXiv.org Artificial Intelligence

2402.10758

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Zhao, Chen, Liu, Shuming, Mangalam, Karttikeya, Qian, Guocheng, Zohra, Fatimah, Alghannam, Abdulmohsen, Malik, Jitendra, Ghanem, Bernard

arXiv.org Artificial IntelligenceJan-8-2024

Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or Dr$^2$Net, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. Dr$^2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision. We evaluate Dr$^2$Net on various pretrained models and various tasks, and show that it can reach comparable performance to conventional finetuning but with significantly less memory usage.

architecture, dr 2, reversible network, (17 more...)

arXiv.org Artificial Intelligence

2401.04105

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Optimization and Generalization of Multi-head Attention

Deora, Puneesh, Ghaderi, Rouzbeh, Taheri, Hossein, Thrampoulidis, Christos

arXiv.org Machine LearningOct-19-2023

The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2310.1268

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Fast Interactive Search with a Scale-Free Comparison Oracle

Chumbalov, Daniyar, Klein, Lars, Maystre, Lucas, Grossglauser, Matthias

arXiv.org Artificial IntelligenceJun-2-2023

A comparison-based search algorithm lets a user find a target item $t$ in a database by answering queries of the form, ``Which of items $i$ and $j$ is closer to $t$?'' Instead of formulating an explicit query (such as one or several keywords), the user navigates towards the target via a sequence of such (typically noisy) queries. We propose a scale-free probabilistic oracle model called $\gamma$-CKL for such similarity triplets $(i,j;t)$, which generalizes the CKL triplet model proposed in the literature. The generalization affords independent control over the discriminating power of the oracle and the dimension of the feature space containing the items. We develop a search algorithm with provably exponential rate of convergence under the $\gamma$-CKL oracle, thanks to a backtracking strategy that deals with the unavoidable errors in updating the belief region around the target. We evaluate the performance of the algorithm both over the posited oracle and over several real-world triplet datasets. We also report on a comprehensive user study, where human subjects navigate a database of face portraits.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Artificial Intelligence

2306.01814

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data-Driven Response Regime Exploration and Identification for Dynamical Systems

Farid, Maor

arXiv.org Artificial IntelligenceApr-6-2023

Data-Driven Response Regime Exploration and Identification (DR$^2$EI) is a novel and fully data-driven method for identifying and classifying response regimes of a dynamical system without requiring human intervention. This approach is a valuable tool for exploring and discovering response regimes in complex dynamical systems, especially when the governing equations and the number of response regimes are unknown, and the system is expensive to sample. Additionally, the method is useful for order reduction, as it can be used to identify the most dominant response regimes of a given dynamical system. DR$^2$EI utilizes unsupervised learning algorithms to transform the system's response into an embedding space that facilitates regime classification. An active sequential sampling approach based on Gaussian Process Regression (GPR) is used to efficiently sample the parameter space, quantify uncertainty, and provide optimal trade-offs between exploration and exploitation. The performance of the DR$^2$EI method was evaluated by analyzing three established dynamical systems: the mathematical pendulum, the Lorenz system, and the Duffing oscillator. The method was shown to effectively identify a variety of response regimes with both similar and distinct topological features and frequency content, demonstrating its versatility in capturing a wide range of behaviors. While it may not be possible to guarantee that all possible regimes will be identified, the method provides an automated and efficient means for exploring the parameter space of a dynamical system and identifying its underlying "sufficiently dominant" response regimes without prior knowledge of the system's equations or behavior.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2304.05822

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback